Translating Collocations for Bilingual Lexicons: A Statistical Approach

نویسندگان

  • Frank Smadja
  • Kathleen McKeown
  • Vasileios Hatzivassiloglou
چکیده

Collocations are notoriously difficult for non-native speakers to translate, primarily because they are opaque and cannot be translated on a word-by-word basis. We describe a program named Champollion which, given a pair of parallel corpora in two different languages and a list of collocations in one of them, automatically produces their translations. Our goal is to provide a tool for compiling bilingual lexical information above the word level in multiple languages,for different domains. The algorithm we use is based on statistical methods and produces p-word translations of n-word collocations in which n and p need not be the same. For example, Champollion translates make ... decision, employment equity, and stock market into prendre ... d6cision, 6quit6 en mati6re d'emploi, and bourse respectively. Testing Champollion on three years' worth of the Hansards corpus yielded the French translations of 300 collocations for each year, evaluated at 73% accuracy on average. In this paper, we describe the statistical measures used, the algorithm, and the implementation of Champollion, presenting our results and evaluation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Translating Collocations for Use in Bilingual Lexicons

Collocations are notoriously difficult for non-native speakers to translate, primarily because they are opaque and can not be translated on a word by word basis. We describe a program named Champollion which, given a pair of parallel corpora in two different languages, automatically produces translations of an input list of collocations. Our goal is to provide a tool to compile bilingual lexica...

متن کامل

Building a Bilingual Lexicon Using Phrase-based Statistical Machine Translation via a Pivot Language

This paper proposes a novel method for building a bilingual lexicon through a pivot language by using phrase-based statistical machine translation (SMT). Given two bilingual lexicons between language pairs Lf–Lp and Lp–Le, we assume these lexicons as parallel corpora. Then, we merge the extracted two phrase tables into one phrase table between Lf and Le. Finally, we construct a phrase-based SMT...

متن کامل

Building Multiword Expressions Bilingual Lexicons for Domain Adaptation of an Example-Based Machine Translation System

We describe in this paper a hybrid approach to build automatically bilingual lexicons of Multiword Expressions (MWEs) from parallel corpora. We more specifically investigate the impact of using a domain-specific bilingual lexicon of MWEs on domain adaptation of an Example-Based Machine Translation (EBMT) system. We conducted experiments on the English-French language pair and two kinds of texts...

متن کامل

Collocational Translation Memory Extraction Based on Statistical and Linguistic Information

In this paper, we propose a new method for extracting bilingual collocations from a parallel corpus to provide phrasal translation memories. The method integrates statistical and linguistic information to achieve effective extraction of bilingual collocations. The linguistic information includes parts of speech, chunks, and clauses. The method involves first obtaining an extended list of Englis...

متن کامل

Bilingual Collocation Extraction Based on Syntactic and Statistical Analyses

In this paper, we describe an algorithm that employs syntactic and statistical analysis to extract bilingual collocations from a parallel corpus. The preferred syntactic patterns are obtained from idioms and collocations in a machine-readable dictionary. Phrases matching the patterns are extract from aligned sentences in a parallel corpus. Those phrases are subsequently matched up via cross-lin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Linguistics

دوره 22  شماره 

صفحات  -

تاریخ انتشار 1996